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ABSTRACT 



A discussion of performance contracting, defined as 
an agreement between a group offering instruction and a school 
needing the services, is presented. Four major hazards to direct 
measurement of specific learning are considered: poor statement of 

objectives; selection of the wrong tests; misinterpretation of test 
scores; and depersonalization of contemporary life. These and other 
problems such as human and testing error, valid criterion testing, 
and the question of when to test, are discussed in full. The 
relationship of these hazards of performance measurement to 
performance contracting, and to regular school programs, is 
presented. (AG) 
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(With a Special Look at Performance Contracting) 
Robert E. Stake 

Center for Instructional Research and Curriculum Evaluation 
University of Illinois at Urbana “Champaign 



r*"\ 

ro 

ru 

to 

o 

o 

UJ 



00 

CO 

CD 

o 

o 

o 



"Can there be teaching if there is no learning? M Hear again one of the 
lines from the educator's catechism. The question is not to be taken literally. 
Good teaching, elegant teaching, without student benefit, of course is possible— 
though doubly wasteful. The question is rhetorical. Professionals and laymen 
alike sanctify that teaching-learning contract that results in better student 
performance* 

Measuring the learning is no small problem. Teachers, as a matter of 
course, usually are able to observe that individual students are or are not 
learning. Sometimes they cannot. And increasingly , outsiders are reluctant to 
take the teacher* s word for it. Gathering ,r hard-da ta ,f evidence of student 
learning is a new and ominous challenge. Of course, we have tests. But the 
results of our testing have seldom been adequate grounds for the continuing faith 
we have in education. 

Present Demands , Expectations of testing are on the rise because 
schools have been told to be accountable-- to demonstrate publicly what they are 
accomplishing (Lieberman, 1970; Bhaerman, 1970). Increasing educational costs 
and increasing frustration with social and and political problems have brought 
higher demands for answers to an important question; What are we getting for 
our education dollar? 

Educators have been challenged to become more explicit and more func- 
tional in lesson plans and school budgets; to identify the gains and losses 



^ *A paper prepared with financial support from the National Educational Finance 
Project and the Office of the Superintendent of Public Instruction, State of 
Illinois. 
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children make in reading, singing, and the many human talents; and to realize 
that the events of the classroom are not unrelated to the events of the street, 
the marketplace, and City Hall (Cohen, 1970). Educators have been told to learn 
about systems analysis, operations research, cost-benefit analysis, program 
planning and budgeting, and other models for orderly and dispassionate treatment 
of institutional affairs (Lessinger, 1970)4 

Some critics of contemporary education are bothered greatly by the fact 
that educational practice is so intuitive, impulsive, inefficient, and resistant 
to change. Others continue to be bothered more by passionate but naive efforts 
to substitute technical procedures for personal attention. Thorndike (1921), 
Tyler (1950), and Krathwohl (1969) have been persuasive advocates of a more 
rational, explicit, performance-oriented school. But Atkin (1968), Oettinger 
(1969), and Dyer (1970) have cautioned that formal analyses and production models 
can be narrow, irrelevant, and even oppressive. It is safe to say that all 
specialists in testing and instruction believe that it is possible to measure 
many specific educational outcomes and to use such measurements in improving 
educational decisions. But a few of these same specialists are among the most 
vehement critics of present testing (Glaser, 1963; Grobman, 1971). 

Tests for Performance Contracts. The performance contract is an agree- 
ment between a group offering instruction and a school needing services (Lennon, 
1971). Reimbursement is to be made in some proportion to measured student 
achievement. Especially for children having special needs, such as nonreading, 
handicapped, or gifted children, a new way of getting special instruction is 
appealing. A ,f hard-data n basis for evaluating the quality of instruction is 
appealing. In performance contracting student gains are the criterion of suc- 
cessful teaching. 
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In the first federally sponsored example of performance contracting for 
the public schools, Dorsett Educational Systems of Norman, Oklahoma, contracted 
to teach reading, mathematics, and study skills to over 200 poor-performance 
juniors and senior high school students in Texarkana, Texas. Commercially avail- 
able, standardized tests were used to measure performance gains. 

Are such tests suitable for measuring specific learnings? To the 
person not intimately acquainted with educational testing it appears that per- 
formance testing is what educational tests are for. The testing specialist knows 
that this is not so. These tests have been developed and administered to measure 
correlates of learnings not learning itself. 

Most tests are indirect measures of educational gains, correlates of 

learning rather than direct evidence of achievement. Correlation with important 

general learning is often high, but correlation of test scores with performance 

on many specific educational tasks is seldom high. Tests can be built for spe- 

\ 

cific competence, but there is relatively littjle demand for them and many of them 
do a poor job of predicting later performance of either a specific or general 
nature. General achievement tests "predict" better. The test developer's basis 
for improving tests has been to work toward better predictions of later perform- 
ance rather than better measurement of present performance. Assessment of what 
a student: is now capable of doing is not the purpose of most standardized tests. 
Especially when indirect-measurement tests are used for performance contracting, 
but even with direct-assessment tests, errors and hazards abound. 

In this paper I will identify the major obstacles to direct measurement 
of the specific things that learners learn. / 
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The Errors of Testing 

Answering a National School Board Journa 1 (November 1970) questionnaire 
on performance contracting, a Nevr Jersey board member said, "Objectives must be 
stated in simple, understandable terms. No jargon will do and no subjective 
goals can be tolerated. Neither can the nonsense about there being some mystique 
that prohibits objective measurement of the educational endeavor. 11 Would that 
our problems would wither before stern resolve. But neither wishing nor blus- 
tering rids educational testing of its errors. They exist. 

Just as the population census and the bathroom scales have their 
errors, educational tests have theirs. The technology and theory of testing are 
highly sophisticated; the sources of error are well known (Lindquist, 1951; 
Cronbach, 1969), Looking into the psychometrist r s meaning of "A Theory of 
Testing, n one finds a consideration of ways to analyze and label the inaccuracies 
in test scores (Lord, 1952). There is mystique, but there is also simple fact: 

No one can eliminate test errors. Unfortunately, some errors in testing are 
large enough to cause wrong decisions about individual children or about school- 
district policy. 

The whole idea of educational testing is thought to be an error by some 
educators and social critics (Hoffman, 1962; Holt, 1969; Silberman, 1970; Sizer, 
1970), Bad social consequences of testing, such as the perpetuation of racial 
discrimination (Goslin, 1970) and pressures to cheat (McGhan, 1970) continue to 
be discussed. But, as would be expected, most test specialists believe that the 
promise in testing outweighs these perils. They refuse responsibility for gross 
misuse of their instruments and findings; and they concentrate their attention 
on reducing the errors in specific tests and test programs (Lennon, no date). 
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Some technical errors in test scores are small and tolerable. But some 
testing errors are intolerably large. Today* s tests can, for example, measure 
vocabulary word-recognition skills sufficiently accurately. Today*s tests cannot 
adequately measure listening comprehension or the ability to analyze the opposing 
sides to an argument. 

Today f s test technology is not refined enough to meet all the demands 
put on it. The tests are best when the performance is highly speci fic- -when , 
for example, calling for the student to add two numbers, recognize a misspelled 
word, or identify the parts of a hydraulic lift. When a teacher wants to measure 
performances calling for the higher mental processes (Bloom e_t aj^, 1956), such 
as generating a writing principle or synthesizing a political argument, our 
tests give us scores that are less dependable. See Table 1 for several examples. 
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Table I 

Examples of Items of High and Low Validity 
in Conventional Standardized Achievement Tests 

High validi ty-- Ir basic mental process* 1 items : 

*1. Which one of the following phrases about wave motion defines period? 

a. the maximum distance a particle is displaced from its point of rest 

b. the length of time required for a particle to make a complete vibration 

c. the number of complete vibrations per second 

d. the time rate of change of distance in a given direction 

*2. Directions: In each group below, select the numbered word or phrase which 

most nearly corresponds in meaning to the word at the head of that group, 
and put its number in the parenthesis at right. 

( ) antelope 

a # fruit b. animal c. prelude d. feeler e. gallop 

*3. The first movement of a sonata is distinguished from the others by: 

a. rapidity and gaiety 

b. length and complexity 

c. emotional abandon 

d. sweetness and charm 

e. structural formality 

4. Which of these would help you decide whether or not you used the word 
"filter" correctly in a sentence? 

a, encyclopedia 
b # dictionary 

c. thesaurus 

d. English grammar textbook 
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